Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Jeff Kaplan Date: Mar 12, 2016 4:06pm
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

ok, your book is done now.

Reply to this post
Reply [edit]

Poster: timlee Date: Mar 12, 2016 9:16pm
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

Thanks. But the OCRed text is not put back to the pdf file. A pdf file with OCR text is usually created, but not in this case.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Mar 13, 2016 9:22am
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

the system never alters the uploaded file and since it was detected as a text.pdf it would not create another one.

Reply to this post
Reply [edit]

Poster: timlee Date: Mar 13, 2016 10:03am
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

I meant a new pdf file is created with original scanned images and OCR text, not altering the original pdf file.

Is the reason because the pdf file is detected as text pdf instead of Image contained pdf?
I remember there is no option for me when I uploaded the file.

The file was still OCRed, but the output is a text file not put back to a new pdf file.
Now I have changed the the metadata option from text pdf to image contained pdf, but no new pdf file is created for both original pdf file and OCR text. How can I make it create such a new pdf file?


Same problem happened to another file https://archive.org/details/timlee126_yahoo_All_201603

Thanks.

This post was modified by timlee on 2016-03-13 17:03:04

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Mar 13, 2016 12:17pm
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

yes, it needs to be an image container to produce a text pdf. i reran it and now there is one at https://archive.org/download/timlee126_yahoo_All_201603/all_text.pdf

Reply to this post
Reply [edit]

Poster: timlee Date: Mar 13, 2016 3:09pm
Forum: texts Subject: Re: How to make sure a scanned pdf file will be OCRed?

Thanks.

The first file still has no new pdf file created with OCR text. https://archive.org/details/timlee126_yahoo_Tmp1
Can you also rerun it?

This happen quite often recently. I don't want to bother you often.


Is it possible that I can specify an original pdf to be image container pdf, and thus will OCR the pdf? (Both at the time when I upload the file, and after that and when I find no pdf file with OCR text is created)

This post was modified by timlee on 2016-03-13 21:58:06

This post was modified by timlee on 2016-03-13 22:09:07